296 research outputs found
CMS software deployment on OSG
A set of software deployment tools has been developed for the installation,
verification, and removal of a CMS software release. The tools that are mainly targeted for the
deployment on the OSG have the features of instant release deployment, corrective
resubmission of the initial installation job, and an independent web-based deployment portal
with Grid security infrastructure login mechanism. We have been deploying over 500
installations and found the tools are reliable and adaptable to cope with problems with changes
in the Grid computing environment and the software releases. We present the design of the
tools, statistics that we gathered during the operation of the tools, and our experience with the
CMS software deployment on the OSG Grid computing environment
Designing Computing System Architecture and Models for the HL-LHC era
This paper describes a programme to study the computing model in CMS after
the next long shutdown near the end of the decade.Comment: Submitted to proceedings of the 21st International Conference on
Computing in High Energy and Nuclear Physics (CHEP2015), Okinawa, Japa
Characterizing network paths in and out of the clouds
Commercial Cloud computing is becoming mainstream, with funding agencies
moving beyond prototyping and starting to fund production campaigns, too. An
important aspect of any scientific computing production campaign is data
movement, both incoming and outgoing. And while the performance and cost of VMs
is relatively well understood, the network performance and cost is not. This
paper provides a characterization of networking in various regions of Amazon
Web Services, Microsoft Azure and Google Cloud Platform, both between Cloud
resources and major DTNs in the Pacific Research Platform, including OSG data
federation caches in the network backbone, and inside the clouds themselves.
The paper contains both a qualitative analysis of the results as well as
latency and throughput measurements. It also includes an analysis of the costs
involved with Cloud-based networking.Comment: 7 pages, 1 figure, 5 tables, to be published in CHEP19 proceeding
Running a Pre-Exascale, Geographically Distributed, Multi-Cloud Scientific Simulation
As we approach the Exascale era, it is important to verify that the existing
frameworks and tools will still work at that scale. Moreover, public Cloud
computing has been emerging as a viable solution for both prototyping and
urgent computing. Using the elasticity of the Cloud, we have thus put in place
a pre-exascale HTCondor setup for running a scientific simulation in the Cloud,
with the chosen application being IceCube's photon propagation simulation. I.e.
this was not a purely demonstration run, but it was also used to produce
valuable and much needed scientific results for the IceCube collaboration. In
order to reach the desired scale, we aggregated GPU resources across 8 GPU
models from many geographic regions across Amazon Web Services, Microsoft
Azure, and the Google Cloud Platform. Using this setup, we reached a peak of
over 51k GPUs corresponding to almost 380 PFLOP32s, for a total integrated
compute of about 100k GPU hours. In this paper we provide the description of
the setup, the problems that were discovered and overcome, as well as a short
description of the actual science output of the exercise.Comment: 18 pages, 5 figures, 4 tables, to be published in Proceedings of ISC
High Performance 202
The Scalable Systems Laboratory: a Platform for Software Innovation for HEP
The Scalable Systems Laboratory (SSL), part of the IRIS-HEP Software
Institute, provides Institute participants and HEP software developers
generally with a means to transition their R&D from conceptual toys to testbeds
to production-scale prototypes. The SSL enables tooling, infrastructure, and
services supporting the innovation of novel analysis and data architectures,
development of software elements and tool-chains, reproducible functional and
scalability testing of service components, and foundational systems R&D for
accelerated services developed by the Institute. The SSL is constructed with a
core team having expertise in scale testing and deployment of services across a
wide range of cyberinfrastructure. The core team embeds and partners with other
areas in the Institute, and with LHC and other HEP development and operations
teams as appropriate, to define investigations and required service deployment
patterns. We describe the approach and experiences with early application
deployments, including analysis platforms and intelligent data delivery
systems
Moving the California distributed CMS xcache from bare metal into containers using Kubernetes
The University of California system has excellent networking between all of
its campuses as well as a number of other Universities in CA, including
Caltech, most of them being connected at 100 Gbps. UCSD and Caltech have thus
joined their disk systems into a single logical xcache system, with worker
nodes from both sites accessing data from disks at either site. This setup has
been in place for a couple years now and has shown to work very well.
Coherently managing nodes at multiple physical locations has however not been
trivial, and we have been looking for ways to improve operations. With the
Pacific Research Platform (PRP) now providing a Kubernetes resource pool
spanning resources in the science DMZs of all the UC campuses, we have recently
migrated the xcache services from being hosted bare-metal into containers. This
paper presents our experience in both migrating to and operating in the new
environment
Parallelized and Vectorized Tracking Using Kalman Filters with CMS Detector Geometry and Events
The High-Luminosity Large Hadron Collider at CERN will be characterized by
greater pileup of events and higher occupancy, making the track reconstruction
even more computationally demanding. Existing algorithms at the LHC are based
on Kalman filter techniques with proven excellent physics performance under a
variety of conditions. Starting in 2014, we have been developing
Kalman-filter-based methods for track finding and fitting adapted for many-core
SIMD processors that are becoming dominant in high-performance systems.
This paper summarizes the latest extensions to our software that allow it to
run on the realistic CMS-2017 tracker geometry using CMSSW-generated events,
including pileup. The reconstructed tracks can be validated against either the
CMSSW simulation that generated the hits, or the CMSSW reconstruction of the
tracks. In general, the code's computational performance has continued to
improve while the above capabilities were being added. We demonstrate that the
present Kalman filter implementation is able to reconstruct events with
comparable physics performance to CMSSW, while providing generally better
computational performance. Further plans for advancing the software are
discussed
Hadoop distributed file system for the Grid
Data distribution, storage and access are essential to CPU-intensive and data-intensive high performance Grid computing. A newly emerged file system, Hadoop distributed file system (HDFS), is deployed and tested within the Open Science Grid (OSG) middleware stack. Efforts have been taken to integrate HDFS with other Grid tools to build a complete service framework for the Storage Element (SE). Scalability tests show that sustained high inter-DataNode data transfer can be achieved for the cluster fully loaded with data-processing jobs. The WAN transfer to HDFS supported by BeStMan and tuned GridFTP servers shows large scalability and robustness of the system. The hadoop client can be deployed at interactive machines to support remote data access. The ability to automatically replicate precious data is especially important for computing sites, which is demonstrated at the Large Hadron Collider (LHC) computing centers. The simplicity of operations of HDFS-based SE significantly reduces the cost of ownership of Petabyte scale data storage over alternative solutions
HEP Community White Paper on Software trigger and event reconstruction
Realizing the physics programs of the planned and upgraded high-energy
physics (HEP) experiments over the next 10 years will require the HEP community
to address a number of challenges in the area of software and computing. For
this reason, the HEP software community has engaged in a planning process over
the past two years, with the objective of identifying and prioritizing the
research and development required to enable the next generation of HEP
detectors to fulfill their full physics potential. The aim is to produce a
Community White Paper which will describe the community strategy and a roadmap
for software and computing research and development in HEP for the 2020s. The
topics of event reconstruction and software triggers were considered by a joint
working group and are summarized together in this document.Comment: Editors Vladimir Vava Gligorov and David Lang
- âŠ